Improving the performance of the matrix inversion on a Tesla GPU
نویسندگان
چکیده
We study two different techniques for the computation of a matrix inverse, the traditional approach based on Gaussian factorization and the Gauss-Jordan elimination alternative more suitable for parallel architectures. The target architecture is a current general-purpose multi-core processor (CPU) connected to a graphics processor (GPU). Parallelism is obtained from the use of libraries MKL (for the CPU) and CUBLAS (for the GPU), as well as, performing simultaneously operations in both architectures. Numerical experiments performed on a system equipped with two Intel QuadCore processors and a Tesla C1060 GPU, illustrate the efficiency attained by the Gauss-Jordan elimination implementation.
منابع مشابه
Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function
We investigate the performance of two approaches for matrix inversion based on Gaussian (LU factorization) and Gauss-Jordan eliminations. The target architecture is a current general-purpose multicore processor connected to a graphics processor (GPU). Parallelism is extracted in both processors by linking sequential versions of the codes with multi-threaded implementations of BLAS. Our results ...
متن کاملA Direct Matrix Inversion-Less Analysis for Distribution System Power Flow Considering Distributed Generation
This paper presents a new direct matrix inversion-less analysis for radial distribution systems (RDSs). The method can successfully deal with weakly meshed distribution systems. (WMDSs). Being easy to implement, direct methods (DMs) provide an excellent performance. Matrix inversion is the mean reason of divergence and low-efficiency in power flow algorithms. In this paper, the performance of t...
متن کاملHigh-Performance Matrix-Vector Multiplication on the GPU
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrixvector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-...
متن کاملA Parallel Algebraic Multigrid Solver on Graphics Processing Units
The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the manycore GPU architecture. A performance comparison of the parallel sol...
متن کاملFinite Element Matrix Generation on a Gpu
This paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a Fermi GPU (1x T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010